GH-343: Fix BaseVariableWidthVector and BaseLargeVariableWidthVector offset buffer serialization#989
Conversation
This comment has been minimized.
This comment has been minimized.
jbonofre
left a comment
There was a problem hiding this comment.
It's the similar fix as for valueBuffer afair. It sounds good to me.
|
@Yicong-Huang sorry to be a pain, can you please rebase ? Thanks ! |
Thanks @jbonofre . I did rebase but I think the root error in this change is |
|
@Yicong-Huang can you please rebase again ? Sorry about that. Else I can do the rebase for you. |
Thanks. Just merged the latest master back in. |
|
@Yicong-Huang thanks a lot ! I would like to include this fix on Arrow Java 19.0.0 release 😄 |
|
Closes #343 |
Thanks that'd be nice! I can work with you closely on fixing this. However I see it is still failing CI, I think the original issue is still persist
Do you have any suggestions? |
|
I did a new pass on the PR and I don't think it's correct.
I think the problem is that the PR changes |
jbonofre
left a comment
There was a problem hiding this comment.
See my previous comment about the tests.
What's Changed
Fix
BaseVariableWidthVector/BaseLargeVariableWidthVectorIPC serialization whenvalueCountis 0.Problem
When
valueCount == 0,setReaderAndWriterIndex()was settingoffsetBuffer.writerIndex(0), which meansreadableBytes() == 0. IPC serializer usesreadableBytes()to determine buffer size, so 0 bytes were written to the IPC stream. This crashes IPC readers in other libraries because Arrow spec requires offset buffer to have at least one entry[0].This is a follow-up to #967 which fixed the same issue in
ListVector/LargeListVector.Fix
Simplify
setReaderAndWriterIndex()to always use(valueCount + 1) * OFFSET_WIDTHfor offset buffer'swriterIndex. WhenvalueCount == 0, this correctly setswriterIndextoOFFSET_WIDTH, ensuringoffset[0]is included in serialization.Testing
Added tests for empty
VarCharVectorandLargeVarCharVectorverifying offset buffer has correctreadableBytes().related to #343